11 research outputs found
Bayesian network structure learning with causal effects in the presence of latent variables.
Latent variables may lead to spurious relationships that can be
misinterpreted as causal relationships. In Bayesian Networks (BNs), this
challenge is known as learning under causal insufficiency. Structure learning
algorithms that assume causal insufficiency tend to reconstruct the ancestral
graph of a BN, where bi-directed edges represent confounding and directed edges
represent direct or ancestral relationships. This paper describes a hybrid
structure learning algorithm, called CCHM, which combines the constraint-based
part of cFCI with hill-climbing score-based learning. The score-based process
incorporates Pearl s do-calculus to measure causal effects and orientate edges
that would otherwise remain undirected, under the assumption the BN is a linear
Structure Equation Model where data follow a multivariate Gaussian
distribution. Experiments based on both randomised and well-known networks show
that CCHM improves the state-of-the-art in terms of reconstructing the true
ancestral graph
Bayesian network structure learning in the presence of latent variables
A causal Bayesian Network (BN) is a probabilistic graphical model that captures causal or conditional relationships between variables, and enables causal reasoning under uncertainty. Causal reasoning via graphical representation in turn enables interpretability and full transparency in decision-making, and this makes causal BNs suitable for modelling critical real-world problems that require explainability, such as in healthcare, environmental sciences, government policy and economics. Learning accurate causal structure from data represents a notoriously difficult task, and this difficulty increases with any imperfections present in the input data. For example, real data tend not to capture all relevant variables needed for causal representation, and these missing variables are referred to as hidden or latent variables. If some of the latent variables are latent confounders (i.e., missing common causes), they would confound the effect variables, thereby leading to spurious relationships in the learnt structure that could be misinterpreted as causal relationships. While the relevant literature includes structure learning algorithms that are capable of learning causal structure from data with latent variables, it is fair to say that accurate structural discovery from real data remains an open problem. This thesis studies structure learning algorithms that recover graphical structure from data, and primarily focuses on the problem of latent variables. It investigates new solutions, including structure learning algorithms that learn from both observational and interventional data, approaches for density estimation that can be used to recover the underlying distribution of possible latent confounders, and techniques for hyperparameter optimisation of structure learning algorithms. The thesis explores this set of new approaches by applying them to a range of synthetic and real datasets of varying size, dimensionality, and data noise, and concludes by highlighting open problems and directions for future research
Large-scale empirical validation of Bayesian Network structure learning algorithms with noisy data.
Numerous Bayesian Network (BN) structure learning algorithms have been proposed in the literature over the past few decades. Each publication makes an empirical or theoretical case for the algorithm proposed in that publication and results across studies are often inconsistent in their claims about which algorithm is ‘best’. This is partly because there is no agreed evaluation approach to determine their effectiveness. Moreover, each algorithm is based on a set of assumptions, such as complete data and causal sufficiency, and tend to be evaluated with data that conforms to these assumptions, however unrealistic these assumptions may be in the real world. As a result, it is widely accepted that synthetic performance overestimates real performance, although to what degree this may happen remains unknown. This paper investigates the performance of 15 state-of-the-art, well-established, or recent promising structure learning algorithms. We propose a methodology that applies the algorithms to data that incorporates synthetic noise, in an effort to better understand the performance of structure learning algorithms when applied to real data. Each algorithm is tested over multiple case studies, sample sizes, types of noise, and assessed with multiple evaluation criteria. This work involved learning approximately 10,000 graphs with a total structure learning runtime of seven months. In investigating the impact of data noise, we provide the first large scale empirical comparison of BN structure learning algorithms under different assumptions of data noise. The results suggest that traditional synthetic performance may overestimate real-world performance by anywhere between 10% and more than 50%. They also show that while score-based learning is generally superior to constraint-based learning, a higher fitting score does not necessarily imply a more accurate causal graph. The comparisons extend to other outcomes of interest, such as runtime, reliability, and resilience to noise, assessed over both small and large networks, and with both limited and big data. To facilitate comparisons with future studies, we have made all data, raw results, graphs and BN models freely available online
Effective and efficient structure learning with pruning and model averaging strategies
Learning the structure of a Bayesian Network (BN) with score-based solutions
involves exploring the search space of possible graphs and moving towards the
graph that maximises a given objective function. Some algorithms offer exact
solutions that guarantee to return the graph with the highest objective score,
while others offer approximate solutions in exchange for reduced computational
complexity. This paper describes an approximate BN structure learning
algorithm, which we call Model Averaging Hill-Climbing (MAHC), that combines
two novel strategies with hill-climbing search. The algorithm starts by pruning
the search space of graphs, where the pruning strategy can be viewed as an
aggressive version of the pruning strategies that are typically applied to
combinatorial optimisation structure learning problems. It then performs model
averaging in the hill-climbing search process and moves to the neighbouring
graph that maximises the objective function, on average, for that neighbouring
graph and over all its valid neighbouring graphs. Comparisons with other
algorithms spanning different classes of learning suggest that the combination
of aggressive pruning with model averaging is both effective and efficient,
particularly in the presence of data noise
Open problems in causal structure learning: A case study of COVID-19 in the UK
Causal machine learning (ML) algorithms recover graphical structures that
tell us something about cause-and-effect relationships. The causal
representation praovided by these algorithms enables transparency and
explainability, which is necessary for decision making in critical real-world
problems. Yet, causal ML has had limited impact in practice compared to
associational ML. This paper investigates the challenges of causal ML with
application to COVID-19 UK pandemic data. We collate data from various public
sources and investigate what the various structure learning algorithms learn
from these data. We explore the impact of different data formats on algorithms
spanning different classes of learning, and assess the results produced by each
algorithm, and groups of algorithms, in terms of graphical structure, model
dimensionality, sensitivity analysis, confounding variables, predictive and
interventional inference. We use these results to highlight open problems in
causal structure learning and directions for future research. To facilitate
future work, we make all graphs, models, data sets, and source code publicly
available online
Hybrid Bayesian network discovery with latent variables by scoring multiple interventions
In Bayesian Networks (BNs), the direction of edges is crucial for causal
reasoning and inference. However, Markov equivalence class considerations mean
it is not always possible to establish edge orientations, which is why many BN
structure learning algorithms cannot orientate all edges from purely
observational data. Moreover, latent confounders can lead to false positive
edges. Relatively few methods have been proposed to address these issues. In
this work, we present the hybrid mFGS-BS (majority rule and Fast Greedy
equivalence Search with Bayesian Scoring) algorithm for structure learning from
discrete data that involves an observational data set and one or more
interventional data sets. The algorithm assumes causal insufficiency in the
presence of latent variables and produces a Partial Ancestral Graph (PAG).
Structure learning relies on a hybrid approach and a novel Bayesian scoring
paradigm that calculates the posterior probability of each directed edge being
added to the learnt graph. Experimental results based on well-known networks of
up to 109 variables and 10k sample size show that mFGS-BS improves structure
learning accuracy relative to the state-of-the-art and it is computationally
efficient
Effective and efficient structure learning with pruning and model averaging strategies
Learning the structure of a Bayesian Network (BN) with score-based solutions involves exploring the search space of possible graphs and moving towards the graph that maximises a given objective function. Some algorithms offer exact solutions that guarantee to return the graph with the highest objective score, while others offer approximate solutions in exchange for reduced computational complexity. This paper describes an approximate BN structure learning algorithm, which we call Model Averaging Hill-Climbing (MAHC), that combines two novel strategies with hill-climbing search. The algorithm starts by pruning the search space of graphs, where the pruning strategy can be viewed as an aggressive version of the pruning strategies that are typically applied to combinatorial optimisation structure learning problems. It then performs model averaging in the hill-climbing search process and moves to the neighbouring graph that maximises the objective function, on average, for that neighbouring graph and over all its valid neighbouring graphs. Comparisons with other algorithms spanning different classes of learning suggest that the combination of aggressive pruning with model averaging is both effective and efficient, particularly in the presence of data noise